Introduction¶

In this notebook, I will focus on geospatial techniques using crime data from Kansas City, MO. While the dataset offers extensive opportunities for analysis, this notebook will remain concise, providing a snapshot of key insights. The primary goal is to demonstrate proficiency with tools such as Folium, Plotly, Pandas, and NumPy.

About the Data: Kansas City Police Department (KCPD) Crime Data¶

This dataset contains detailed information about reported crimes in Kansas City, Missouri, for the year 2024. It is designed to support geospatial, temporal, and categorical analyses of crime patterns. You may notice that the data markers center on the right side of Kansas City with a clear distinction. This is because Kansas City sits on the border of Kansas and Missouri and there is a separate Police Department for Kansas City, Kansas. This data is only the Missouri statistics. The data includes information on the type of crimes, their locations, involved individuals, and temporal details. This is real-world data that is updated on a weekly basis, available at https://data.kcmo.org/Crime/KCPD-Crime-Data-2024/isbe-v4d8/about_data.

Columns and Descriptions¶

  • Report_No:

The unique identifier assigned to each crime report.

  • Reported_Date:

The date when the crime was officially reported.

  • Reported_Time:

The exact time when the crime was reported (in 24-hour format).

  • From_Date:

The start date of the crime incident, indicating when it was first observed.

  • From_Time:

The start time of the crime incident.

  • To_Date:

The end date of the crime incident, if applicable. Null values indicate ongoing or open cases.

  • To_Time:

The end time of the crime incident, if applicable.

  • Offense:

The general category of the offense (e.g., Assault, Burglary).

  • IBRS:

The code corresponding to the Incident-Based Reporting System, used for federal crime reporting.

  • Description:

A detailed description of the crime (e.g., Motor Vehicle Theft, Trespass).

  • Beat:

The patrol beat or subregion within the police department's jurisdiction.

  • Address:

The address where the crime occurred, anonymized to maintain privacy.

  • City:

The city where the crime occurred (e.g., Kansas City).

  • Zip Code:

The postal code of the crime's location.

  • Rep_Dist:

The reporting district, a smaller sub-area for statistical crime tracking.

  • Area:

The patrol division or geographic area within the police department (e.g., CPD, SCP).

  • DVFlag:

A Boolean flag indicating whether the crime is domestic violence-related.

  • Involvement:

The role of the individual involved in the crime (e.g., Victim (VIC), Suspect (SUS)).

  • Race:

The race of the involved individual (if applicable).

  • Sex:

The gender of the involved individual (e.g., Male (M), Female (F)).

  • Age:

The age of the involved individual.

  • Fire Arm Used Flag:

A Boolean flag indicating whether a firearm was involved in the incident.

  • Location:

The latitude and longitude coordinates of the crime's location (e.g., POINT (-94.5856 39.06537)).

  • Age_Range:

The age range of the involved individual, categorized into brackets (e.g., 18-24, 25-34).

IMPORTING LIBRARIES AND DATA¶

In [413]:
!pip install folium branca
import folium
from folium.plugins import HeatMap
import matplotlib.pyplot as plt
import plotly.express as px
import numpy as np
import pandas as pd
import warnings 
import termcolor
import plotly.io as pio
pio.renderers.default = 'notebook_connected'

warnings.filterwarnings('ignore')
print('libraries installed!')
Requirement already satisfied: folium in /opt/anaconda3/lib/python3.12/site-packages (0.18.0)
Requirement already satisfied: branca in /opt/anaconda3/lib/python3.12/site-packages (0.8.0)
Requirement already satisfied: jinja2>=2.9 in /opt/anaconda3/lib/python3.12/site-packages (from folium) (3.1.4)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.12/site-packages (from folium) (1.26.4)
Requirement already satisfied: requests in /opt/anaconda3/lib/python3.12/site-packages (from folium) (2.32.2)
Requirement already satisfied: xyzservices in /opt/anaconda3/lib/python3.12/site-packages (from folium) (2022.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/lib/python3.12/site-packages (from jinja2>=2.9->folium) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/lib/python3.12/site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/lib/python3.12/site-packages (from requests->folium) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/lib/python3.12/site-packages (from requests->folium) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.12/site-packages (from requests->folium) (2024.8.30)
libraries installed!
In [414]:
df = pd.read_csv('KCPD_Crime_Data_2024_20241201.csv')

#The default max column output is limited and this snippet allows me to render every column when a dataframe is called.
pd.options.display.max_columns = None

#Ensure the dataset imported correctly.
df.head()
Out[414]:
Report_No Reported_Date Reported_Time From_Date From_Time To_Date To_Time Offense IBRS Description Beat Address City Zip Code Rep_Dist Area DVFlag Involvement Race Sex Age Fire Arm Used Flag Location Age_Range
0 KC24000270 01/02/2024 04:53 01/02/2024 04:53 NaN NaN Trespass of Real Property 90J Trespass of Real Property 113.0 1200 MAIN ST KANSAS CITY 64106.0 PJ1087 CPD False VIC NaN NaN NaN False NaN NaN
1 KC24000567 01/03/2024 16:30 12/31/2023 16:30 NaN NaN Stealing from Building/Residence 23H All Other Larceny 132.0 3400 MAIN ST KANSAS CITY 64111.0 PJ2753 CPD False VIC NaN NaN NaN False POINT (-94.5856 39.06537) NaN
2 KC24000877 01/04/2024 15:15 01/04/2024 15:15 NaN NaN Vehicular - Non-Injury Hit and Run NaN NaN 123.0 W I 670 HWY and W I 70 HWY KANSAS CITY NaN NaN CPD False SUS B M 19.0 False NaN 18-24
3 KC24001196 01/06/2024 00:48 01/06/2024 00:48 NaN NaN Stolen Auto 240 Motor Vehicle Theft 642.0 10300 N CHERRY DR KANSAS CITY NaN NaN SCP False CMP VIC W M 24.0 False POINT (-94.572100389 39.280825631) 18-24
4 KC24001447 01/07/2024 07:00 01/07/2024 07:00 NaN NaN Assault (Aggravated) 26C Impersonation 115.0 00 W PERSHING RD KANSAS CITY 64108.0 PJ1831 CPD False VIC OTH W M 56.0 False NaN 55-64

DATA PRE-PROCESSING¶

In [416]:
print(df.info())

print(df.isnull().sum())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95932 entries, 0 to 95931
Data columns (total 24 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Report_No           95932 non-null  object 
 1   Reported_Date       95932 non-null  object 
 2   Reported_Time       95932 non-null  object 
 3   From_Date           95932 non-null  object 
 4   From_Time           95932 non-null  object 
 5   To_Date             31503 non-null  object 
 6   To_Time             31503 non-null  object 
 7   Offense             95932 non-null  object 
 8   IBRS                87350 non-null  object 
 9   Description         87350 non-null  object 
 10  Beat                95926 non-null  float64
 11  Address             95932 non-null  object 
 12  City                95932 non-null  object 
 13  Zip Code            91243 non-null  float64
 14  Rep_Dist            86051 non-null  object 
 15  Area                95929 non-null  object 
 16  DVFlag              95932 non-null  bool   
 17  Involvement         95932 non-null  object 
 18  Race                83724 non-null  object 
 19  Sex                 84772 non-null  object 
 20  Age                 69651 non-null  float64
 21  Fire Arm Used Flag  95932 non-null  bool   
 22  Location            93795 non-null  object 
 23  Age_Range           69651 non-null  object 
dtypes: bool(2), float64(3), object(19)
memory usage: 16.3+ MB
None
Report_No                 0
Reported_Date             0
Reported_Time             0
From_Date                 0
From_Time                 0
To_Date               64429
To_Time               64429
Offense                   0
IBRS                   8582
Description            8582
Beat                      6
Address                   0
City                      0
Zip Code               4689
Rep_Dist               9881
Area                      3
DVFlag                    0
Involvement               0
Race                  12208
Sex                   11160
Age                   26281
Fire Arm Used Flag        0
Location               2137
Age_Range             26281
dtype: int64

Observations

  • The time and date columns are not in timedate format.

  • Many columns have a large amount of null-values and those need to be accounted for.

In [418]:
#Ensure the dates are in datetime format for accurate temporal analysis.
df['Reported_Date'] = pd.to_datetime(df['Reported_Date'])
df['From_Date'] = pd.to_datetime(df['From_Date'])

#Getting rid of the colons to simplify timedata conversion.
df[['Reported_Time', 'From_Time']] = df[['Reported_Time', 'From_Time']].replace(':', '', regex=True)

#Convert integer to string with leading zeros and format to HH:MM
df['Reported_Time'] = df['Reported_Time'].apply(lambda x: f"{int(x):04d}")  #Ensure 4-digit format
df['From_Time'] = df['From_Time'].apply(lambda x: f"{int(x):04d}")  #Ensure 4-digit format
df['Reported_Time'] = pd.to_datetime(df['Reported_Time'], format='%H%M').dt.time
df['From_Time'] = pd.to_datetime(df['From_Time'], format='%H%M').dt.time

df.head(1)
Out[418]:
Report_No Reported_Date Reported_Time From_Date From_Time To_Date To_Time Offense IBRS Description Beat Address City Zip Code Rep_Dist Area DVFlag Involvement Race Sex Age Fire Arm Used Flag Location Age_Range
0 KC24000270 2024-01-02 04:53:00 2024-01-02 04:53:00 NaN NaN Trespass of Real Property 90J Trespass of Real Property 113.0 1200 MAIN ST KANSAS CITY 64106.0 PJ1087 CPD False VIC NaN NaN NaN False NaN NaN
In [419]:
df[['To_Date','To_Time']].isnull().sum()
Out[419]:
To_Date    64429
To_Time    64429
dtype: int64

I'm making the assumption that these values are null due to the cases still being open.

In [421]:
#Keeping the values in timedate and maintaining integrity. Replacing the null values can be filtered out for analysis accuracy.
df['To_Time'] = df['To_Time'].fillna('00:00:00')
In [422]:
df['To_Date'] = pd.to_datetime(df['To_Date'])

df['To_Time'] = df['To_Time'].replace(':', '', regex=True)

#Convert integer to string with leading zeros and format to HH:MM
df['To_Time'] = df['To_Time'].apply(lambda x: f"{int(x):04d}")  #Ensure 4-digit format
df['To_Time'] = pd.to_datetime(df['To_Time'], format='%H%M').dt.time
In [423]:
 # Replace nulls in To_Date with a placeholder date
df['To_Date'] = df['To_Date'].fillna(pd.Timestamp('2222-01-01'))
In [424]:
df.isnull().sum()
Out[424]:
Report_No                 0
Reported_Date             0
Reported_Time             0
From_Date                 0
From_Time                 0
To_Date                   0
To_Time                   0
Offense                   0
IBRS                   8582
Description            8582
Beat                      6
Address                   0
City                      0
Zip Code               4689
Rep_Dist               9881
Area                      3
DVFlag                    0
Involvement               0
Race                  12208
Sex                   11160
Age                   26281
Fire Arm Used Flag        0
Location               2137
Age_Range             26281
dtype: int64
In [425]:
ibrs_with_nulls = df[df['IBRS'].isnull()]

ibrs_null_counts = ibrs_with_nulls.groupby('Offense')['IBRS'].size()
print(ibrs_null_counts) #Very difficult to discern why these values are missing.
Offense
Abuse of a Child                                                              13
Alcohol Influence Report                                                      23
Animal Bite                                                                   19
Assault (Aggravated)                                                          17
Assault (Aggravated) on Department Member/Outside Law Enforcement Officer      9
                                                                            ... 
Vehicular - Injury Hit and Run                                               154
Vehicular - Non-Injury                                                       522
Vehicular - Non-Injury Hit and Run                                           301
Violation of Ex-Parte Order of Protection                                     62
Violation of Full Order of Protection                                         79
Name: IBRS, Length: 100, dtype: int64
In [426]:
df['Rep_Dist'].value_counts()
Out[426]:
Rep_Dist
PP0321    1758
PJ3601    1271
PJ4990     635
PC0323     557
PJ2650     476
          ... 
PJ2037       1
PJ4689       1
PJ7058       1
PJ6925       1
PJ5160       1
Name: count, Length: 6532, dtype: int64
In [427]:
rdist_with_nulls = df[df['Rep_Dist'].isnull()]

rdist_null_counts = rdist_with_nulls.groupby('Area')['Rep_Dist'].size()
print(rdist_null_counts)
Area
CPD     2642
EPD     1443
MPD     1898
NPD     1001
OSPD     420
SCP     1041
SPD     1436
Name: Rep_Dist, dtype: int64
In [428]:
#Replacing null values in the columns with "UNSPECIFIED" because it will have minimal effect on this analysis.
df[['IBRS','Description','Beat','Zip Code','Rep_Dist','Area','Location']] = df[['IBRS','Description','Beat','Zip Code','Rep_Dist','Area','Location']].fillna('UNSPECIFIED')
In [429]:
df.isnull().sum()
Out[429]:
Report_No                 0
Reported_Date             0
Reported_Time             0
From_Date                 0
From_Time                 0
To_Date                   0
To_Time                   0
Offense                   0
IBRS                      0
Description               0
Beat                      0
Address                   0
City                      0
Zip Code                  0
Rep_Dist                  0
Area                      0
DVFlag                    0
Involvement               0
Race                  12208
Sex                   11160
Age                   26281
Fire Arm Used Flag        0
Location                  0
Age_Range             26281
dtype: int64
In [430]:
print(df['Race'].unique())
print(df['Sex'].unique())
print(df['Age'].unique())
[nan 'B' 'W' 'U' 'I']
[nan 'M' 'F' 'U']
[nan 19. 24. 56. 26. 45. 68. 42. 53. 61. 37. 36. 38. 20. 41. 30. 33. 66.
 47. 62. 25. 63. 28. 22. 46. 48. 70. 18. 31. 29. 32. 21. 51. 49. 60. 39.
 34. 40. 77. 27. 35. 55. 23. 43. 50. 76. 57. 44. 54. 64. 72. 52. 59. 58.
 71. 78. 69. 82. 65. 79. 73. 75. 67. 88. 85. 74. 81. 83. 87. 91. 86. 80.
 98. 90. 84. 93. 95. 92. 89. 94. 96. 99. 97.]
In [431]:
#'U' is already used to classify unknown variables. Using 0 as a placeholder allows me to maintain data integrity since a very large portion of the age column has null values.
df['Race'] = df['Race'].fillna('U')
df['Sex'] = df['Sex'].fillna('U')
df['Age'] = df['Age'].fillna(0)
In [432]:
print(df['Age_Range'].isnull().sum())
26281
In [433]:
#These are only null because the age was unknown.
df['Age_Range'] = df['Age_Range'].fillna('UNSPECIFIED')
In [434]:
df.isnull().sum()
Out[434]:
Report_No             0
Reported_Date         0
Reported_Time         0
From_Date             0
From_Time             0
To_Date               0
To_Time               0
Offense               0
IBRS                  0
Description           0
Beat                  0
Address               0
City                  0
Zip Code              0
Rep_Dist              0
Area                  0
DVFlag                0
Involvement           0
Race                  0
Sex                   0
Age                   0
Fire Arm Used Flag    0
Location              0
Age_Range             0
dtype: int64

All 0s means all null values have been accounted for.

In [436]:
display(df['Location'])
0                               UNSPECIFIED
1                 POINT (-94.5856 39.06537)
2                               UNSPECIFIED
3        POINT (-94.572100389 39.280825631)
4                               UNSPECIFIED
                        ...                
95927            POINT (-94.54769 39.03518)
95928            POINT (-94.59937 39.00477)
95929            POINT (-94.58967 39.03483)
95930                           UNSPECIFIED
95931    POINT (-94.542065027 39.017103984)
Name: Location, Length: 95932, dtype: object

Observation

  • I need to separate the longitude and latitude to make using Folium easier.
In [438]:
#Step 1: Handle "UNSPECIFIED" and keep those rows as NaN in new columns. Having strings in these column will be problematic for Pandas functions.
df['Lat_Lon'] = df['Location'].where(~df['Location'].str.contains('UNSPECIFIED'), np.nan)

#Step 2: Remove the word "POINT" and parentheses for valid rows
df['Lat_Lon'] = df['Lat_Lon'].str.replace('POINT ', '', regex=False).str.strip('()')

#Step 3: Split into Latitude and Longitude
df[['Longitude', 'Latitude']] = df['Lat_Lon'].str.split(' ', expand=True)

#Step 4: Convert Latitude and Longitude to floats
df['Longitude'] = pd.to_numeric(df['Longitude'], errors='coerce')
df['Latitude'] = pd.to_numeric(df['Latitude'], errors='coerce')

#Step 5: Drop the 'Lat_Lon' column
df = df.drop(columns=['Lat_Lon'])

display(df.head())
Report_No Reported_Date Reported_Time From_Date From_Time To_Date To_Time Offense IBRS Description Beat Address City Zip Code Rep_Dist Area DVFlag Involvement Race Sex Age Fire Arm Used Flag Location Age_Range Longitude Latitude
0 KC24000270 2024-01-02 04:53:00 2024-01-02 04:53:00 2222-01-01 00:00:00 Trespass of Real Property 90J Trespass of Real Property 113.0 1200 MAIN ST KANSAS CITY 64106.0 PJ1087 CPD False VIC U U 0.0 False UNSPECIFIED UNSPECIFIED NaN NaN
1 KC24000567 2024-01-03 16:30:00 2023-12-31 16:30:00 2222-01-01 00:00:00 Stealing from Building/Residence 23H All Other Larceny 132.0 3400 MAIN ST KANSAS CITY 64111.0 PJ2753 CPD False VIC U U 0.0 False POINT (-94.5856 39.06537) UNSPECIFIED -94.5856 39.065370
2 KC24000877 2024-01-04 15:15:00 2024-01-04 15:15:00 2222-01-01 00:00:00 Vehicular - Non-Injury Hit and Run UNSPECIFIED UNSPECIFIED 123.0 W I 670 HWY and W I 70 HWY KANSAS CITY UNSPECIFIED UNSPECIFIED CPD False SUS B M 19.0 False UNSPECIFIED 18-24 NaN NaN
3 KC24001196 2024-01-06 00:48:00 2024-01-06 00:48:00 2222-01-01 00:00:00 Stolen Auto 240 Motor Vehicle Theft 642.0 10300 N CHERRY DR KANSAS CITY UNSPECIFIED UNSPECIFIED SCP False CMP VIC W M 24.0 False POINT (-94.572100389 39.280825631) 18-24 -94.5721 39.280826
4 KC24001447 2024-01-07 07:00:00 2024-01-07 07:00:00 2222-01-01 00:00:00 Assault (Aggravated) 26C Impersonation 115.0 00 W PERSHING RD KANSAS CITY 64108.0 PJ1831 CPD False VIC OTH W M 56.0 False UNSPECIFIED 55-64 NaN NaN

This is the end of the pre-processing

GEOSPATIAL ANALYSIS¶

In [441]:
#Filter out the rows with null coordinates
geo_data = df.dropna(subset=['Latitude', 'Longitude'])
geo_data['Latitude'] = pd.to_numeric(geo_data['Latitude'], errors='coerce')
geo_data['Longitude'] = pd.to_numeric(geo_data['Longitude'], errors='coerce')

Crime Heatmap¶

  • Insights:
    • Highlights overall crime density across Kansas City, with areas of higher activity visually intensified.
  • Applications:
    • Useful for police resource allocation.
    • Can inform community awareness campaigns about high-crime areas.
In [443]:
#Create a map centered around Kansas City, MO
center_lat = geo_data['Latitude'].mean()
center_lon = geo_data['Longitude'].mean()
crime_map = folium.Map(location=[center_lat, center_lon], zoom_start=11)

#Create the data for the heatmaps
heatmap_data = geo_data[['Latitude', 'Longitude']].values.tolist()

HeatMap(heatmap_data, radius=8).add_to(crime_map)

crime_map.save('crime_heatmap.html')

crime_map
Out[443]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Observations:¶

  • High-density crime areas are clustered in specific zones, likely urban centers or regions with higher foot traffic.
  • Outlying areas show significantly less crime, suggesting suburban or rural safety.
  • The heatmap effectively identifies regions for priority policing or interventions.

Crime Hotspot Map (with Markers)¶

  • Insights:
    • Specific markers indicate hotspots with detailed popups showing the number of incidents per area.
  • Applications:
    • Enhances the ability to drill down into critical zones for actionable insights.
    • Can guide decision-making for deploying patrol units or crime prevention initiatives.
In [446]:
# Group by 'Area' to find top hotspots
hotspots = geo_data.groupby('Area').size().sort_values(ascending=False)

print("Hotspots by Incident Count:")
print(hotspots)

#Add markers for the top hotspots
for area, count in hotspots.items():
    #Get the mean coordinates for each area
    coords = geo_data[geo_data['Area'] == area][['Latitude', 'Longitude']].mean()
    
    #Add marker to the map
    folium.Marker(
        location=[coords['Latitude'], coords['Longitude']],
        popup=f"Area: {area}<br>Incidents: {count}",
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(crime_map)

#Add layer control for visibilty enhancement
folium.LayerControl().add_to(crime_map)

crime_map.save('crime_hotspots_map.html')

crime_map
Hotspots by Incident Count:
Area
CPD            27471
EPD            21862
MPD            18554
SPD             9419
NPD             8747
SCP             7396
OSPD             343
UNSPECIFIED        3
dtype: int64
Out[446]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Observations:¶

  • The markers provide precise geographic details for the most incident-heavy areas.
  • Popups displaying the number of incidents make it easy to identify the severity of each hotspot.
  • Spatial clustering around markers may indicate systemic issues in these areas, such as socioeconomic disparities or inadequate policing.

Layered Heatmaps by Area¶

  • Insights:
    • Individual heatmap layers allow toggling between areas for a focused analysis of specific regions.
  • Applications:
    • Helps stakeholders prioritize resources based on area-specific trends.
    • Enables dynamic visualization in presentations or dashboards.
In [449]:
#Create a base map
crime_map = folium.Map(location=[39.0997, -94.5786], zoom_start=11)

#Loop through each area and create a separate heatmap layer
for area, group in geo_data.groupby('Area'):
    #Skip if there's no location data
    if group[['Latitude', 'Longitude']].isnull().any(axis=None):
        continue
        
    #FeatureGroup
    area_group = folium.FeatureGroup(name=f"Area: {area}")
    
    #Add a heatmap to the FeatureGroup
    HeatMap(
        group[['Latitude', 'Longitude']].dropna().values.tolist(),
        radius=10, blur=12, min_opacity=0.4
    ).add_to(area_group)
    
    # Add the FeatureGroup to the map
    crime_map.add_child(area_group)

#Add LayerControl for toggling areas
folium.LayerControl().add_to(crime_map)

crime_map.save('layered_heatmap_by_area.html')

display(crime_map)

#Plot the data in a bargraph to see the numbers
division_counts = df['Area'].value_counts()

fig = px.bar(
    x=division_counts.index,
    y=division_counts.values,
    color=division_counts.index,
    labels={'x': 'Patrol Division', 'y': 'Number of Incidents'},
    title='Crime Incidents by Patrol Division',
    text=division_counts.values
)

fig.update_traces(
    textposition='outside',
    textfont=dict(size=10, color='black')
)

fig.update_layout(
    xaxis_title="Patrol Division",
    yaxis_title="Number of Incidents",
    margin=dict(t=50, b=100),
    xaxis=dict(tickangle=-45)
)

fig.show()
Make this Notebook Trusted to load map: File -> Trust Notebook

Observations:¶

  • Heatmaps for individual areas show unique crime patterns, which may vary by the local environment or demographics.
  • Some areas exhibit significant crime density even within smaller neighborhoods, indicating localized issues.
  • Interactive toggling allows deeper insights without overwhelming visual clutter.

Combined Gender-Specific Heatmaps¶

  • Insights:
    • Separate heatmaps for male and female-related crimes, using distinct gradients for clarity.
  • Applications:
    • Provides insights into gender-based crime distribution.
    • Supports gender-focused interventions and policy-making.
In [452]:
gradients = {
    'M': {0.4: 'steelblue', 0.7: 'blue', 1: 'darkblue'},  # Male: Blue gradient
    'F': {0.4: 'pink', 0.7: 'hotpink', 1: 'deeppink'}     # Female: Pink gradient
}

gender_map = folium.Map(location=[39.0997, -94.5786], zoom_start=11)

#Add heatmaps for both genders on the same map
for gender in ['M', 'F']:
    gender_data = df[df['Sex'] == gender][['Latitude', 'Longitude']].dropna()
    
    #Skip if no valid data
    if gender_data.empty:
        print(f"No data available for gender: {gender}")
        continue
    
    #Add heatmap directly to the map
    HeatMap(
        gender_data.values.tolist(),
        radius=12,
        blur=15,
        gradient=gradients[gender],
        min_opacity=0.5,
    ).add_to(gender_map)
#LayerControl
folium.LayerControl().add_to(gender_map)

gender_map.save('combined_gender_heatmap.html')

display(gender_map)

#Calculate counts for each gender
gender_counts = df['Sex'].value_counts()

fig = px.bar(
    x=gender_counts.index,
    y=gender_counts.values,
    color=gender_counts.index,
    labels={'x': 'Gender', 'y': 'Number of Incidents'},
    title='Crime Incidents by Gender',
    text=gender_counts.values
)

fig.update_traces(
    textposition='outside',
    textfont=dict(size=12, color='black') 
)

fig.update_layout(
    xaxis_title="Gender",
    yaxis_title="Number of Incidents",
    margin=dict(t=50, b=100),
    xaxis=dict(tickangle=0)
)

fig.show()
Make this Notebook Trusted to load map: File -> Trust Notebook

Observations:¶

  • Crime patterns differ by gender, with certain areas showing higher incidents for either males or females.
  • Male-related crimes tend to cluster around high-traffic regions, while female-related crimes may indicate domestic or targeted violence hotspots.
  • The contrasting gradients make it easy to visualize gendered crime distribution.

Top 5 Crimes Heatmap¶

  • Insights:
    • Heatmaps for the most frequent crimes (e.g., Assault, Theft) with distinct colors.
  • Applications:
    • Visual tool to understand the spatial prevalence of specific crimes.
    • Facilitates targeted crime prevention strategies.
In [455]:
#No 'Unspecified' allowed
filtered_data = df[df['Description'] != 'UNSPECIFIED']

#Find the top 5 crimes
top_5_crimes = filtered_data['Description'].value_counts().head(5).index

#Filter location data for top 5 crimes
top_5_data = df[df['Description'].isin(top_5_crimes)][['Latitude', 'Longitude', 'Description']].dropna()

#Colors for each crime
gradients = {
    'Simple Assault': {0.4: 'lightblue', 0.7: 'blue', 1: 'darkblue'},
    'Motor Vehicle Theft': {0.4: 'lightgreen', 0.7: 'green', 1: 'darkgreen'},
    'Vandalism/Destruction of Property': {0.4: 'lightcoral', 0.7: 'red', 1: 'darkred'},
    'Aggravated Assault': {0.4: 'khaki', 0.7: 'gold', 1: 'darkgoldenrod'},
    'Shoplifting': {0.4: 'plum', 0.7: 'purple', 1: 'darkviolet'},
}

#Create the base map
crimes_map = folium.Map(location=[39.0997, -94.5786], zoom_start=11)

#Add a heatmap for each crime
for crime in top_5_crimes:
    crimes_data = top_5_data[top_5_data['Description'] == crime][['Latitude', 'Longitude']]
    
    if crimes_data.empty:
        continue
    
    HeatMap(
        crimes_data.values.tolist(),
        radius=10,
        blur=8,
        min_opacity=0.3,
        gradient=gradients.get(crime, {0.4: 'lightblue', 0.65: 'blue', 1: 'darkblue'}),
        name=crime
    ).add_to(crimes_map)

folium.LayerControl().add_to(crimes_map)

crimes_map.save('top_5_crimes_heatmap.html')

display(crimes_map)
Make this Notebook Trusted to load map: File -> Trust Notebook
In [456]:
#Calculate the counts for the top 5 crimes
top_5_crime_counts = df[df['Description'].isin(top_5_crimes)]['Description'].value_counts()

fig = px.bar(
    x=top_5_crime_counts.index,
    y=top_5_crime_counts.values,
    color=top_5_crime_counts.index,
    labels={'x': 'Crime Type', 'y': 'Number of Incidents'},
    title='Top 5 Crimes by Frequency',
    text=top_5_crime_counts.values
)

fig.update_traces(
    textposition='outside',
    textfont=dict(size=12, color='black') 
)

fig.update_layout(
    xaxis_title="Crime Type",
    yaxis_title="Number of Incidents",
    margin=dict(t=50, b=100),
    xaxis=dict(tickangle=-45)  
)

fig.show()

Observations:¶

  • Each crime type has distinct clustering patterns, suggesting environmental factors contributing to specific crimes.
  • For instance, motor vehicle thefts might concentrate near parking lots or highways, while assaults could occur more in residential or nightlife districts.
  • The color-coding for each crime type makes cross-comparison straightforward.

Domestic Violence Heatmap¶

  • Insights:
    • Focuses on areas with high incidences of domestic violence (DVFlag=True).
  • Applications:
    • Guides community services to address domestic violence hotspots.
    • Enables specialized intervention programs.
In [459]:
#Filter data where DVFlag is True
dv_data = geo_data[geo_data['DVFlag'] == True]

dv_data = dv_data.dropna(subset=['Latitude', 'Longitude'])

dv_map = folium.Map(location=[39.0997, -94.5786], zoom_start=11)

heatmap_data = dv_data[['Latitude', 'Longitude']].dropna().values.tolist()

#Add a heatmap for domestic violence incidents
HeatMap(
    heatmap_data,
    radius=10, blur=12, min_opacity=0.4
).add_to(dv_map)

dv_map.save('dv_flag_heatmap.html')

dv_map
Out[459]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [460]:
dvflag_counts = df['DVFlag'].value_counts()

custom_labels = {True: 'Domestic Violence', False: 'Non-Domestic Violence'}

fig = px.bar(
    x=[custom_labels[val] for val in dvflag_counts.index],
    y=dvflag_counts.values,
    color=[custom_labels[val] for val in dvflag_counts.index],
    labels={'x': 'Incident Type', 'y': 'Number of Incidents'},
    title='Incidents by DVFlag',
    text=dvflag_counts.values  
)

fig.update_traces(
    textposition='outside', 
    textfont=dict(size=12, color='black') 
)

fig.update_layout(
    xaxis_title="Incident Type",
    yaxis_title="Number of Incidents",
    margin=dict(t=50, b=100),
    xaxis=dict(tickangle=0) 
)

fig.show()

Observations:¶

  • Domestic violence hotspots tend to appear in residential zones, highlighting areas where such incidents are prevalent.
  • These areas may benefit from targeted community outreach programs or increased support services.
  • The distribution reinforces the need for data-driven interventions to address domestic violence.

Hotspot Map by Streets¶

  • Insights:
    • Identifies specific streets within each area with high incident counts.
  • Applications:
    • Informs street-level safety measures such as increased lighting or surveillance.
    • Assists city planning to improve safety infrastructure.
In [463]:
map_center = [geo_data['Latitude'].mean(), geo_data['Longitude'].mean()]
division_streets_map = folium.Map(location=map_center, zoom_start=11)

#Step 1: Extract the street name by removing numbers and spaces before the first string character
geo_data['Street'] = geo_data['Address'].str.replace(r'^\d+\s*', '', regex=True)

#Step 2: Filter out rows with "UNSPECIFIED" in the Street or Area column
geo_data = geo_data[~geo_data['Street'].str.contains("UNSPECIFIED", na=False)]
geo_data = geo_data[~geo_data['Area'].str.contains("UNSPECIFIED", na=False)]

#Step 3: Ensure no null values exist in required columns
geo_data = geo_data.dropna(subset=['Area', 'Latitude', 'Longitude'])

#Step 4: Group by 'Area' and 'Street' to find incident counts
area_street_hotspots = (
    geo_data.groupby(['Area', 'Street'])
    .size()
    .reset_index(name='Incident Count')
    .sort_values(by=['Area', 'Incident Count'], ascending=[True, False])
)

#Step 5: Get the top 5 streets per area
top_street_per_area = (
    area_street_hotspots.groupby('Area', group_keys=False)
    .apply(lambda x: x.head(5))
    .reset_index(drop=True)
)

#Add a heatmap layer for each Area's top 5 streets
for area in top_street_per_area['Area'].unique():
    area_data = geo_data[geo_data['Area'] == area]
    top_streets = top_street_per_area[top_street_per_area['Area'] == area]['Street']
    area_data = area_data[area_data['Street'].isin(top_streets)]
    
    heatmap_layer = HeatMap(area_data[['Latitude', 'Longitude']].dropna().values, name=f"Hotspots in {area}")
    division_streets_map.add_child(heatmap_layer)

folium.LayerControl().add_to(division_streets_map)
division_streets_map.save('hotspot_map.html')

division_streets_map
Out[463]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [464]:
#I could not figure out how to debug the plotly loop for this so I switched to subplots
#Find the unique areas
unique_areas = top_street_per_area['Area'].unique()

#Set up the subplot grid
num_areas = len(unique_areas)
fig, axes = plt.subplots(num_areas, 1, figsize=(10, 6.5 * num_areas), constrained_layout=True)

colors = plt.cm.tab20(np.linspace(0, 1, 10)) 

#Create a bar chart for each area
for i, area in enumerate(unique_areas):
    ax = axes[i] if num_areas > 1 else axes 
    area_data = top_street_per_area[top_street_per_area['Area'] == area]

    bars = ax.bar(
        area_data['Street'],
        area_data['Incident Count'],
        color=colors[:len(area_data)],
        edgecolor='black'
    )

    for bar in bars:
        height = bar.get_height()
        ax.annotate(
            f"{height}",
            xy=(bar.get_x() + bar.get_width() / 2, height),
            xytext=(0, 5),
            textcoords="offset points",
            ha='center',
            va='bottom',
            fontsize=10,
            color='black',
            weight='bold'
        )
    
    ax.set_title(f"Top 5 Streets in {area}", fontsize=14)
    ax.set_xlabel("Street", fontsize=12)
    ax.set_ylabel("Incident Count", fontsize=12)
    ax.set_xticks(range(len(area_data['Street'])))
    ax.set_xticklabels(area_data['Street'], rotation=45, ha='right')

plt.show()
No description has been provided for this image

Observations:¶

  • Certain streets consistently show high incident counts, possibly indicating problematic areas such as poorly lit zones or high foot traffic.
  • The top streets per area reveal patterns of recurring issues that could be linked to infrastructure or local activity.
  • Concentrating efforts on these streets could yield significant safety improvements.

CONCLUSION AND FINAL THOUGHTS¶

The analysis provides an overview of crime patterns in Kansas City, leveraging geospatial and categorical insights. Key findings include:

  1. Crime Distribution:

    • High-density crime areas are concentrated in urban centers and specific hotspots, with distinct clusters for different crime types and demographics.
  2. Area-Specific Trends:

    • Layered heatmaps and hotspot markers reveal significant variations in crime patterns across areas, enabling targeted interventions and resource allocation.
  3. Gendered Crime Insights:

    • Gender-specific heatmaps show differing patterns of male and female-related crimes, with implications for gender-focused safety measures.
  4. Top Crime Types:

    • Certain crimes, such as assaults and thefts, dominate the dataset and exhibit distinct spatial clustering, which could inform preventative measures.
  5. Domestic Violence Prevalence:

    • Domestic violence hotspots highlight areas in need of focused community support and outreach programs.
  6. Street-Level Analysis:

    • Identifying specific streets with high incidents offers actionable insights for localized improvements, such as increased lighting or law enforcement presence.
  7. Visual Insights Through Bar Graphs:

    • Bar graphs complement the geospatial analysis by quantifying disparities in patrol divisions, gender-based incidents, and top crime types, providing clarity for decision-making.

Implications:¶

  • This analysis emphasizes the importance of data-driven decision-making in law enforcement, community safety, and urban planning.
  • Stakeholders can use these findings to allocate resources more effectively, design targeted interventions, and implement long-term safety improvements.

Future Directions:¶

  • Integrating predictive modeling to forecast crime trends.
  • Combining socioeconomic data for deeper insights into crime drivers.
  • Enhancing public awareness by making these visualizations accessible to the community.

Final Thoughts¶

Initially, this notebook began as an exploratory data analysis, incorporating Folium maps. However, after a few days, it became clear that a more targeted approach was necessary to avoid getting overwhelmed by the data. Leveraging the available location data, I shifted my focus to map visualizations. While this dataset offers opportunities for deeper analysis, this exercise was primarily about improving my relationship with Folium. It has certainly increased my comfort with the library, and I look forward to exploring more advanced geospatial techniques in the future.

In [ ]: